natural product
Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR Spectra
Xiong, Ziyu, Zhang, Yichi, Alauddin, Foyez, Cheng, Chu Xin, An, Joon Soo, Seyedsayamdost, Mohammad R., Zhong, Ellen D.
Nuclear Magnetic Resonance (NMR) spectroscopy is a cornerstone technique for determining the structures of small molecules and is especially critical in the discovery of novel natural products and clinical therapeutics. Yet, interpreting NMR spectra remains a time-consuming, manual process requiring extensive domain expertise. We introduce ChefNMR (CHemical Elucidation From NMR), an end-to-end framework that directly predicts an unknown molecule's structure solely from its 1D NMR spectra and chemical formula. We frame structure elucidation as conditional generation from an atomic diffusion model built on a non-equivariant transformer architecture. To model the complex chemical groups found in natural products, we generated a dataset of simulated 1D NMR spectra for over 111,000 natural products. ChefNMR predicts the structures of challenging natural product compounds with an unsurpassed accuracy of over 65%. This work takes a significant step toward solving the grand challenge of automating small-molecule structure elucidation and highlights the potential of deep learning in accelerating molecular discovery. Code is available at https://github.com/ml-struct-bio/chefnmr.
NaFM: Pre-training a Foundation Model for Small-Molecule Natural Products
Ding, Yuheng, Wang, Yusong, Qiang, Bo, Yu, Jie, Li, Qi, Zhou, Yiran, Liu, Zhenmin
Natural products, as metabolites from microorganisms, animals, or plants, exhibit diverse biological activities, making them crucial for drug discovery. Nowadays, existing deep learning methods for natural products research primarily rely on supervised learning approaches designed for specific downstream tasks. However, such one-model-for-a-task paradigm often lacks generalizability and leaves significant room for performance improvement. Additionally, existing molecular characterization methods are not well-suited for the unique tasks associated with natural products. To address these limitations, we have pre-trained a foundation model for natural products based on their unique properties. Our approach employs a novel pretraining strategy that is especially tailored to natural products. By incorporating contrastive learning and masked graph learning objectives, we emphasize evolutional information from molecular scaffolds while capturing side-chain information. Our framework achieves state-of-the-art (SOTA) results in various downstream tasks related to natural product mining and drug discovery. We first compare taxonomy classification with synthesized molecule-focused baselines to demonstrate that current models are inadequate for understanding natural synthesis. Furthermore, by diving into a fine-grained analysis at both the gene and microbial levels, NaFM demonstrates the ability to capture evolutionary information. Eventually, our method is experimented with virtual screening, illustrating informative natural product representations that can lead to more effective identification of potential drug candidates.
- Asia > China > Shaanxi Province > Xi'an (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > Germany > Rheinland-Pfalz > Mainz (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.93)
Accelerating Antibiotic Discovery with Large Language Models and Knowledge Graphs
Delmas, Maxime, Wysocka, Magdalena, Gusicuma, Danilo, Freitas, André
The discovery of novel antibiotics is critical to address the growing antimicrobial resistance (AMR). However, pharmaceutical industries face high costs (over $1 billion), long timelines, and a high failure rate, worsened by the rediscovery of known compounds. We propose an LLM-based pipeline that acts as an alarm system, detecting prior evidence of antibiotic activity to prevent costly rediscoveries. The system integrates organism and chemical literature into a Knowledge Graph (KG), ensuring taxonomic resolution, synonym handling, and multi-level evidence classification. We tested the pipeline on a private list of 73 potential antibiotic-producing organisms, disclosing 12 negative hits for evaluation. The results highlight the effectiveness of the pipeline for evidence reviewing, reducing false negatives, and accelerating decision-making. The KG for negative hits and the user interface for interactive exploration will be made publicly available.
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
- Europe > Switzerland (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (2 more...)
- Research Report (0.50)
- Overview (0.47)
FragFM: Efficient Fragment-Based Molecular Generation via Discrete Flow Matching
Lee, Joongwon, Kim, Seonghwan, Kim, Wou Youn
A BSTRACT We introduce FragFM, a novel fragment-based discrete flow matching framework for molecular graph generation. FragFM generates molecules at the fragment level, leveraging a coarse-to-fine autoencoding mechanism to reconstruct atom-level details. This approach reduces computational complexity while maintaining high chemical validity, enabling more efficient and scalable molecular generation. Notably, FragFM achieves over 99% validity with significantly fewer sampling steps, improving scalability while preserving molecular diversity. These results highlight the potential of fragment-based generative modeling for large-scale, property-aware molecular design, paving the way for more efficient exploration of chemical space. 1 I NTRODUCTION Deep generative models, such as diffusion and flow matching, have demonstrated remarkable success across domains like images (Nichol et al., 2021; Rombach et al., 2022; Ho et al., 2020), text (Li et al., 2022), and videos (Hu & Xu, 2023; Ho et al., 2022). Recently, their application to molecular graph generation has gained attention, where they aim to generate chemically valid molecules by leveraging the structural properties of molecular graphs (Jo et al., 2022; Vignac et al., 2022; Qin et al., 2024). However, existing atom-based generative models face scalability challenges, particularly in generating large and complex molecules. The quadratic growth of edges as graph size increases results in computational inefficiencies. At the same time, the inherent sparsity of chemical bonds makes accurate edge prediction more complex, often leading to unrealistic molecular structures or invalid connectivity constraints (Qin et al., 2023; Chen et al., 2023). Additionally, graph neural networks (GNNs) struggle to capture topological features such as rings and loops, leading to deviations from chemically valid structures. While various methods incorporate auxiliary features (e.g., spectral, ring, and valency information) to mitigate these issues, they do not fully resolve the sparsity and scalability bottlenecks (Vignac et al., 2022).
NPGPT: Natural Product-Like Compound Generation with GPT-based Chemical Language Models
Sakano, Koh, Furui, Kairi, Ohue, Masahito
Natural products are substances produced by organisms in nature and often possess biological activity and structural diversity. Drug development based on natural products has been common for many years. However, the intricate structures of these compounds present challenges in terms of structure determination and synthesis, particularly compared to the efficiency of high-throughput screening of synthetic compounds. In recent years, deep learning-based methods have been applied to the generation of molecules. In this study, we trained chemical language models on a natural product dataset and generated natural product-like compounds. The results showed that the distribution of the compounds generated was similar to that of natural products. We also evaluated the effectiveness of the generated compounds as drug candidates. Our method can be used to explore the vast chemical space and reduce the time and cost of drug discovery of natural products.
- Europe > Germany > Rheinland-Pfalz > Mainz (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
ShEPhERD: Diffusing shape, electrostatics, and pharmacophores for bioisosteric drug design
Adams, Keir, Abeywardane, Kento, Fromer, Jenna, Coley, Connor W.
Engineering molecules to exhibit precise 3D intermolecular interactions with their environment forms the basis of chemical design. In ligand-based drug design, bioisosteric analogues of known bioactive hits are often identified by virtually screening chemical libraries with shape, electrostatic, and pharmacophore similarity scoring functions. We instead hypothesize that a generative model which learns the joint distribution over 3D molecular structures and their interaction profiles may facilitate 3D interaction-aware chemical design. We specifically design ShEPhERD, an SE(3)-equivariant diffusion model which jointly diffuses/denoises 3D molecular graphs and representations of their shapes, electrostatic potential surfaces, and (directional) pharmacophores to/from Gaussian noise. Inspired by traditional ligand discovery, we compose 3D similarity scoring functions to assess ShEPhERD's ability to conditionally generate novel molecules with desired interaction profiles. We demonstrate ShEPhERD's potential for impact via exemplary drug design tasks including natural product ligand hopping, protein-blind bioactive hit diversification, and bioisosteric fragment merging.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > Middle East > Jordan (0.04)
How A.I. Teaches Machines to Discover Drugs
When I first became a doctor, I cared for an older man whom I'll call Ted. He was so sick with pneumonia that he was struggling to breathe. His primary-care physician had prescribed one antibiotic after another, but his symptoms had only worsened; by the time I saw him in the hospital, he had a high fever and was coughing up blood. His lungs seemed to be infected with methicillin-resistant Staphylococcus aureus (MRSA), a bacterium so hardy that few drugs can kill it. I placed an oxygen tube in his nostrils, and one of my colleagues inserted an I.V. into his arm. We decided to give him vancomycin, a last line of defense against otherwise untreatable infections.
- Europe > Jersey (0.15)
- North America > United States > New Jersey (0.05)
- South America > Argentina (0.04)
- (6 more...)
Domain-Agnostic Molecular Generation with Self-feedback
Fang, Yin, Zhang, Ningyu, Chen, Zhuo, Guo, Lingbing, Fan, Xiaohui, Chen, Huajun
The generation of molecules with desired properties has gained tremendous popularity, revolutionizing the way scientists design molecular structures and providing valuable support for chemical and drug design. However, despite the potential of language models in molecule generation, they face numerous challenges such as the generation of syntactically or chemically flawed molecules, narrow domain focus, and limitations in creating diverse and directionally feasible molecules due to a dearth of annotated data or external molecular databases. To tackle these challenges, we introduce MolGen, a pre-trained molecular language model tailored specifically for molecule generation. Through the reconstruction of over 100 million molecular SELFIES, MolGen internalizes profound structural and grammatical insights. This is further enhanced by domain-agnostic molecular prefix tuning, fostering robust knowledge transfer across diverse domains. Importantly, our self-feedback paradigm steers the model away from ``molecular hallucinations'', ensuring alignment between the model's estimated probabilities and real-world chemical preferences. Extensive experiments on well-known benchmarks underscore MolGen's optimization capabilities in properties such as penalized logP, QED, and molecular docking. Additional analyses affirm its proficiency in accurately capturing molecule distributions, discerning intricate structural patterns, and efficiently exploring the chemical space. Code is available at https://github.com/zjunlp/MolGen.
- South America > Colombia > Meta Department > Villavicencio (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Developing a Knowledge Graph Framework for Pharmacokinetic Natural Product-Drug Interactions
Taneja, Sanya B., Callahan, Tiffany J., Paine, Mary F., Kane-Gill, Sandra L., Kilicoglu, Halil, Joachimiak, Marcin P., Boyce, Richard D.
Pharmacokinetic natural product-drug interactions (NPDIs) occur when botanical natural products are co-consumed with pharmaceutical drugs. Understanding mechanisms of NPDIs is key to preventing adverse events. We constructed a knowledge graph framework, NP-KG, as a step toward computational discovery of pharmacokinetic NPDIs. NP-KG is a heterogeneous KG with biomedical ontologies, linked data, and full texts of the scientific literature, constructed with the Phenotype Knowledge Translator framework and the semantic relation extraction systems, SemRep and Integrated Network and Dynamic Reasoning Assembler. NP-KG was evaluated with case studies of pharmacokinetic green tea- and kratom-drug interactions through path searches and meta-path discovery to determine congruent and contradictory information compared to ground truth data. The fully integrated NP-KG consisted of 745,512 nodes and 7,249,576 edges. Evaluation of NP-KG resulted in congruent (38.98% for green tea, 50% for kratom), contradictory (15.25% for green tea, 21.43% for kratom), and both congruent and contradictory (15.25% for green tea, 21.43% for kratom) information. Potential pharmacokinetic mechanisms for several purported NPDIs, including the green tea-raloxifene, green tea-nadolol, kratom-midazolam, kratom-quetiapine, and kratom-venlafaxine interactions were congruent with the published literature. NP-KG is the first KG to integrate biomedical ontologies with full texts of the scientific literature focused on natural products. We demonstrate the application of NP-KG to identify pharmacokinetic interactions involving enzymes, transporters, and pharmaceutical drugs. We envision that NP-KG will facilitate improved human-machine collaboration to guide researchers in future studies of pharmacokinetic NPDIs. The NP-KG framework is publicly available at https://doi.org/10.5281/zenodo.6814507 and https://github.com/sanyabt/np-kg.
- North America > United States > Missouri > Jackson County > Kansas City (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Washington > Spokane County > Spokane (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Harnessing AI to Discover New Drugs: Rewriting the Rulebook for Pharmaceutical Research
Artificial intelligence (AI) is able to recognize the biological activity of natural products in a targeted manner, as researchers at ETH Zurich have demonstrated. Moreover, AI helps to find molecules that have the same effect as a natural substance but are easier to manufacture. This opens up huge possibilities for drug discovery, which also has potential to rewrite the rulebook for pharmaceutical research. Nature has a vast store of medicinal substances. "Over 50 percent of all drugs today are inspired by nature," says Gisbert Schneider, Professor of Computer- Assisted Drug Design at ETH Zurich.